Skip to content

Conversation

@ArthurDeclercq
Copy link
Contributor

@ArthurDeclercq ArthurDeclercq commented Jan 9, 2026

This pull request introduces significant performance improvements and new feature extraction capabilities to the ms2rescore-rs Rust crate. The main focus is on adding parallel processing for spectrum parsing and feature computation, integrating new dependencies, and exposing new feature extraction functions to Python via PyO3.

Performance improvements (parallelization):

  • Enabled parallel processing for reading and parsing MS2 spectra and precursor information from both mzData and timsrust sources, using the rayon crate.
  • Parallelization is also used in the new feature extraction functions (below) that are introduced to bring computational heavy feature calculation to rust. Also here parallelization should significantly speed up compute time.

New feature extraction and API enhancements:

  • Added a new module ms2_features with a ms2_features_from_ms2spectra function, which computes a variety of spectrum annotation and matching features (such as explained intensity, b/y ion coverage, and hyperscore) in parallel, and exposes it as a Python-callable function.
  • Added a new module ms2pip_features with a ms2pip_features_from_prediction_peak_arrays function, which computes all ms2pip features required for ms2rescore in parallel, and exposes it as a Python-callable function.
  • Registered both ms2_features_from_ms2spectra and ms2pip_features_from_prediction_peak_arrays as Python functions in the module initialization.

Python interoperability improvements:

  • Enhanced the MS2Spectrum and Precursor class with a #[pyclass(module = "ms2rescore_rs", get_all, set_all)] decorator, a Python-compatible constructor, and a __reduce__ method for better compatibility with Python serialization (e.g., pickling). This is required for multiprocessing with MS²Rescore.

Package management:

  • Bumped the crate version from 0.4.3 to 0.5.0 to reflect the new features and breaking changes.
  • Introduced new dependencies: rayon, rustyms, ordered-float, and numpy in Cargo.toml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants